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Abstract — This paper introduces a novel, well-founded, be- 
tweenness measure, called the Bag-of-Paths (BoP) betweenness, as 
well as its extension, the BoP group betweenness, to tackle semi- 
supervised classification problems on weighted directed graphs. 
The objective of semi-supervised classification is to assign a label 
to unlabeled nodes using the whole topology of the graph and 
the labeled nodes at our disposal. The BoP betweenness relies on 
a bag-of-paths framework 1 1 1 assigning a Boltzmann distribution 
on the set of all possible paths through the network such that long 
(high-cost) paths have a low probability of being picked from the 
bag, while short (low-cost) paths have a high probability of being 
picked. Within that context, the BoP betweenness of node j is 
defined as the sum of the a posteriori probabilities that node j lies 
in-between two arbitrary nodes i, k, when picking a path starting 
in i and ending in k. Intuitively, a node typically receives a high 
betweenness if it has a large probability of appearing on paths 
connecting two arbitrary nodes of the network. This quantity can 
be computed in closed form by inverting anxn matrix where 
n is the number of nodes. For the group betweenness, the paths 
are constrained to start and end in nodes within the same class, 
therefore defining a group betweenness for each class. Unlabeled 
nodes are then classified according to the class showing the 
highest group betweenness. Experiments on various real-world 
data sets show that BoP group betweenness outperforms all the 
tested state-of-the-art methods |2|-|5|. The benefit of the BoP 
betweenness is particularly noticeable when only a few labeled 
nodes are available. 

Index Terms — Graph and network analysis, network data, 
graph mining, betweenness centrality, kernels on graphs, semi- 
supervised classification. 

I. Introduction 

AS is well-known, the goal of a classification task is to 
automatically assign data to predefined classes. Tradi- 
tional pattern recognition, machine learning or data mining 
classification methods require large amounts of labeled train- 
ing instances, which are often difficult to obtain. The effort 
required to label the data can be reduced using, for example, 
semi-supervised learning methods. This name comes from the 
fact that the data used is a mixture of data used for supervised 
and unsupervised learning (see, e.g., (6), (TJ for a comprehen- 
sive introduction). Actually, semi-supervised learning methods 
learn from both labeled and unlabeled instances. This allows 
to reduce the amount of labeled instances needed to achieve 
the same level of classification accuracy. 

Graph-based semi-supervised classification has received a 
growing focus in recent years. The problem can be described 
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as follows: given an input graph with some nodes labeled, the 
problem is to predict the missing node labels. This problem 
has numerous applications such as classification of individ- 
uals in social networks, linked documents (e.g. patents or 
scientific papers) categorization, or protein function prediction, 
to name a few. In this kind of application (as in many 
others), unlabeled data are usually available in large quantities 
and are easy to collect: friendship links can be recorded on 
Facebook, text documents can be crawled from the internet 
and DNA sequences of proteins are readily available from 
gene databases. Given a relatively small labeled data set and a 
large unlabeled data set, semi-supervised algorithms can infer 
useful information from both sources. 

Still another way to reduce the effort required to label the 
training data is to use an active learning framework. Active 
learning methods reduce the number of labeled data required 
for learning by intelligently choosing which instance to ask to 
be labeled next (see, e.g., JS)). However, this second approach 
will not be studied in this paper and is left for future work. 

This paper tackles this problem within the bag-of-paths 
(BoP) framework 1 1 1 capturing the global structure of the 
graph with, as building block, network paths. More precisely, 
we assume a weighted directed graph or network G where 
a cost is associated to each arc. We further consider a bag 
containing all the possible paths (also called walks) between 
pairs of nodes in G. Then, a Boltzmann distribution, depending 
on a temperature parameter T, is defined on the set of paths 
such that long (high-cost) paths have a low probability of being 
picked from the bag, while short (low-cost) paths have a high 
probability of being picked. In this probabilistic framework, 
the BoP probabilities, P(s = i,e = j), of sampling a path 
starting in node i and ending in node j can easily be computed 
in closed form by a simple nx n matrix inversion where n is 
the number of nodes. 

Within this context, a betweenness measure quantifying to 
which extent a node j is in between two nodes i and k is 
defined. More precisely, the BoP betweenness of a node j of 
interest is defined quite naturally as the sum of the a posteriori 
probabilities that node j (intermediate node) lies in between 
two arbitrary nodes i, k, bet, = ^"=1 Sfc=i P(int = j\s = 
i,e — k), when picking a path starting in i and ending in k. 
Intuitively, a node receives a high betweenness if it has a large 
probability of appearing on paths connecting two arbitrary 
nodes of the network. 

For the group betweenness, the paths are constrained to 
start and end in nodes of the same class, therefore defining a 
group betweenness between classes, gbet^ (C; , Ck ) = P(int = 
j\s € Ci,e € Cfe). Unlabeled nodes are then classified 
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according to the class showing the highest group betweenness 
when starting and ending within the same class. 

In summary, this work has three main contributions: 

• It develops both a betweenness measure and a group 
betweenness measure from a well-founded theoretical 
framework, the bag-of-paths framework introduced in (TJ. 
These two measures can be easily computed in closed 
form. 

• This group betweenness measure provides a new algo- 
rithm for graph-based semi-supervised classification. 

• It assesses the accuracy of the proposed algorithm on 
thirteen standard data sets and compares it to state-of- 
the-art techniques. The obtained performances are highly 
competitive in comparison with the other graph-based 
semi-supervised techniques. 

In this paper, the BoP classifier (or just BoP) will refer to 
the semi-supervised classification algorithm based on the bag- 
of-paths group betweenness, which is developed in Section 

m 

The paper is organized as follows. Section [IT] introduces 
background and notations, mainly the bag-of-paths and the 
bag-of-hitting-paths models. Then, related works in semi- 



supervised classification is discussed in Section III The bag- 
of-paths betweenness and group betweenness centralities are 
introduced in Section |IV] This enables us to derive the BoP 
classifier in Section [V] Then experiments involving the BoP 
classifier and classifiers discussed in the related works section 



will be performed in Section VI Results and discussions of 



those experiments can be found in Section |VI-C| Finally, 
Section |VII| concludes this paper and opens a reflexion for 
further works. 

II. Background and notations 

This section aims to introduce the theoretical background 
and notations used in this paper. First, graph-based semi- 



supervised classification will be discussed in Section II-A then 
the bag-of-paths model introduced in [1] will be summarized 



in Section II-B Finally, the bag-of-hitting-paths model will be 
introduced in Section Hl-CI 



A. Graph-based semi-supervised classification 

Consider a weighted directed graph or network, G, strongly 
connected with a set of n nodes V (or vertices) and a set of 
edges 8. Also consider a set of classes, C, with the number 
of classes equals to m. Each node is assumed to belong to 
at most one class, since the class label can also be unknown. 
To represent the class memberships, an n x m-dimensional 
indicator matrix, Y, is used. On each of its rows, it contains 
as entries 1 when the corresponding node belongs to class c 
and otherwise (m zeros on line i if node i is unlabeled). The 
c-th column of Y will be denoted y c . To each edge between 
node i and j is associated a positive number cy > 0. This 
number represents the immediate cost of transition between 
node i and j. If there is no link between i and j, the cost is 
assumed to take a large value, denoted by cy = oo. The cost 
matrix C is an n x n matrix containing the cy as elements. 



Moreover, a natural random walk on G is defined in the 
standard way. In node i, the random walker chooses the next 
edge to follow according to reference transition probabilities 



representing the probability of jumping from node i to node 
j € Succ(i), the set of successor nodes of i. The correspond- 
ing transition probabilities matrix will be denoted as P lef . In 
other words, the random walker chooses to follow an edge with 
a probability proportional to the inverse of the immediate cost 
(apart from the sum-to-one normalization), therefore favoring 
edges having a low cost. 

B. The bag-of-paths framework 

The framework introduced in (T) is extended in this paper in 
order to define new betweenness measures. The bag-of-paths 
(BoP) model can be considered as a motif-based model [9|, 
JT0| using, as building blocks, paths of the network. In the next 
section, hitting paths will be used instead, as motifs. The BoP 
framework is based on the probability of picking a path i j 
starting at a node i and ending in a node j from a virtual bag 
containing all possible paths of the network. Let us define V%j 
as the set of all possible paths connecting node i to node j, 
including loops. Let us also define the set of all paths through 
the graph as V — U™j=i ^Vr Each path is weighted according 
to its total cost so that the likelihood of picking a low-cost 
path is higher that picking a high-cost path (low-cost paths 
are therefore favoured). The total cost of a path p, c(p), is 
defined as the sum of the individual transition costs cy along 
p. A path p (also called a walk) is a sequence of transitions 
to adjacent nodes on G (loops are allowed), initiated from a 
starting node s, and stopping in an ending node e. 

The potentially infinite set of paths in the graph is enu- 
merated and a probability distribution is assigned to each 
individual path: the longer (high-cost) the path, the smaller 
the probability of following it. This probability distribution 
depends on the inverse-temperature parameter, 6 = ^ > 0, 
controlling the exploration carried out in the graph. In [1], the 
authors assume that the probability of picking a path V from 
the bag follows a Boltzmann distribution (for details, see 

0): 

n^(p)exp[-9c(p)} 



P(P) 



p'ev 



(2) 



which is derived in [ 1 1 from a cost minimization perspective 
subject to a relative entropy constraint. Recall that V is the 
set of all paths through the graph and 7r lef is the product of 
the transition probabilities pfj along the path p. As expected, 
short (low-cost) paths are favored since they have a larger 
probability of being picked. Furthermore, when 9 — > + , 
the path probabilities reduce to the probabilities given by the 
natural random walk on the graph. On the other hand, when 
6 becomes large, the probability distribution defined by Q is 
more and more biased towards shorter paths (the most likely 
paths are the shortest ones). 
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The bag-of-paths probability is the quantity P(s = i, e = 
j). It is defined as the probability of drawing a path starting 
from node i and ending in node j from the bag-of-paths: 

^ ref (p)exp[-^(p)] 

P( s = i, e = j) = £g«__ — — ( 3 ) 



E 

p'ev 



^ ret (p')ex P [-^(p')] 



where it is assumed for the reference probabilities that the 
starting and ending nodes are selected thanks to a uniform 
probability. In fil, the authors have also shown that this 
probability can be easily calculated as 



P(s = i, e = j) = 



-^r, with Z = (I-W)~ 

Jit 



2^ 2^ z vf 

i'=lj'=l 



(4) 



where Z = Y^i=i S?=i z ij * s trie partition function and z^ 
is the element i,j of matrix Z. In (HI, matrix Z is called the 
fundamental matrix and is computed from the n x n matrix 



W = P ref oexp[-(9C] 



(5) 



where o is the elementwise (Hadamard) matrix product and 
the logarithm and exponential functions are taken elementwise. 
The entries of W are therefore Wij — pfj exp [— dcij]. Notice 
that P(e = j\s = i) is not symmetric and that variables z^j 
are defined as JTJ 



E ^ ref 

pep, 



^ ef (p)exp[-0c(p)] 



(6) 



We now turn to a variant of the bag-of-paths, the bag-of- 
hitting-paths. 

C. The bag-of-hitting-paths framework 

The idea behind the bag-of-hitting-paths model is the same 
as the bag-of-paths model but the set of paths is now restricted 
to trajectories in which the ending node does not appear more 
than once, i.e. it only appears at the end of the path. In other 
words, no intermediate node on the path is allowed to be the 
ending node j (node j is made absorbing) and the motifs are 
now the hitting paths. Hitting paths will play an important role 
in the derivation of the BoP betweenness. In that case, it can 
be shown 1 1 1 that the probability of drawing a hitting path 
i j is 

P h (s = i,e=j) = 4/ 



E 



(7) 



i',j' = l 



with z^a = Zij/zjj. The partition function for the bag-of- 
hitting-paths is therefore 



^=E4=E^ (8) 

i.j — l — 1 ^ 

More information about the bag-of-hitting-paths model can 
be found in (T). Let us simply mention that it can further be 
shown that the variables z, h „ are defined as 



$j= E ^ ref (p)ex P [-^(p)] 



(9) 



where Vfj is now the set of hitting (or absorbing) paths from i 
to j. Finally, it was also shown in fTl that z\^ can be interpreted 
as either: 

• The expected reward endorsed by an agent (the reward 
along a path p being defined as exp[— 6c(p)]) when 
traveling from i to j along all possible paths p £ 
with probability 7r ref (p). 

• The expected number of passages through node j for a 
evaporating random walker starting in node i and walking 
according to the sub-stochastic transition probabilities 
Pfj expi-Ocij]. 

III. Related work 

Graph-based semi-supervised classification has been the 
subject of intensive research in recent years and a wide range 
of approaches has been developed in order to tackle the prob- 
lem J6|, (7), CD]' G3 : Random-walk-based methods (13), 

f, spectral methods |15|, fT6[ , regularization frameworks 
(T7|-|[19), transductive and spectral SVM (20), to name a 
few. We will compare our method (the BoP) to some of those 
techniques, namely, 

1) A simple alignment with the regularized laplacian ker- 
nel (RL) based on a sum of similarities, Ky c , where 
K = (I + AL) _1 , L = D A is the laplacian matrix, 
I is the identity matrix, D is the generalized outdegree 
matrix, and A is the adjacency matrix of G p8) , pi) , 
| |22| . The similarity is computed for each class c in turn. 
Then, each node is assigned to the class showing the 
largest similarity. The (scalar) parameter A > is the 
regularization parameter |23|, (24). 



2) A simple alignment with the regularized normalized 
laplacian kernel (RNL) based on a sum of similarities, 
Ky c , where K = (I + AL)~\ and L = D" 1/2 LD" 1/2 
is the normalized laplacian matrix Q, (25) . The assign- 
ment to the classes is the same than previous method. 
The regularized normalized laplacian approach seems 
less sensitive to the priors of the different classes than 
the un-normalized regularized laplacian approach (RL) 



3) A simple alignment with the regularized commute time 
kernel (RCT) based on a sum of similarities, Ky c , 
with K = (D + aA)- 1 Q, ||23). The assignment 
to the classes is the same as for previous methods. 
The element of this kernel can be interpreted as 
the discounted cumulated probability of visiting node 
j when starting from node i. The (scalar) parameter 
a G ]0, 1] corresponds to an evaporating or killing 
random walk where the random walker has a (1 — a) 
probability of disappearing at each step. This method 
provided the best results in a recent comparative study 
on semi-supervised classification (23). 

4) The harmonic function (HF) approach (5), (TT) , is 
closely related to the regularization framework of RL 
and RNL. It is based on a structural contiguity measure 
that smoothes the predicted values and leads to a model 
having interesting interpretations in terms of electrical 
potential and absorbing probabilities in a Markov chain. 
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5) The random walk with restart (RWWR) classifier (3), 
p6| , (27) relies on random walks performed on the 
weighted graph seen as a Markov chain. More precisely, 
a group betweenness measure is derived for each class, 
based on the stationary distribution of a random walk 
restarting from the labeled nodes belonging to a class 
of interest. Each unlabeled node is then assigned to 
the class showing maximal betweenness. In this version 
(23) , the random walker has a probability (1 — a) to 
be teleported - with a uniform probability - to a node 
belonging to the class of interest c. 

6) The discriminative random walks approach (D-walks or 
DW1; see (2)) also relies on random walks performed 
on the weighted graph seen as a Markov chain. As 
for the RWWR, a group betweenness measure, based 
on passage times during random walks, is derived for 
each class. However, this time, the group betweenness is 
computed between two groups of nodes and not a single 
class as for the RWWR method. More precisely, a ID- 
walks is a random walk starting in a labeled node and 
ending when any node having the same label (possibly 
the starting node itself) is reached for the first time. 
During this random walk, the number of visits to any 
unlabeled node is recorded and corresponds to a group 
betweenness measure. As for the previous method, each 
unlabeled node is then assigned to the class showing 
maximal betweenness. 

7) A modified version of the D-walks (or DW2). The only 
difference is that all elements of the transition matrix 
P ref (since the random walks is seen as a Markov chain) 
are multiplied by a € ]0, 1] so that a can be seen as a 
probability of continuing the random walk at each time 
step (and so (1 — a) € [0, 1[ is the probability at each 
step of stopping the random walk. This defines a killing 
random walk since aP ref is now sub-stochastic. 



All these methods (see Table IV for a summary) will be 



compared to the bag-of-paths (BoP) developed in the next 
sections. The random-walk-based methods usually suffer from 
the fact that the random walker takes too long - and thus 
irrelevant - paths into account so that popular entries are 
intrinsically favored (28) , [29 1. The bag-of-path approach 
tackles this issue by putting a negative exponential term in 
Q and part of its success can be imputed to this fact. 

Some authors also considered bounded (or truncated) walks 



[24 1, [30], [31 1 and obtained promising results on large graphs. 
This approach could also be considered in our framework in 
order to tackle large networks; this will be investigated in 
further work. 

Tong et al. suggested a method avoiding to take the inverse 
of a n x n matrix for computing the random walk with restart 
measure (26). They reduce the computing time by partitioning 
the input graph into smaller communities. Then, a sparse 
approximate of the random walk with restart is obtained by 
applying a low rank approximation. This approach suffers 
from the fact that it adds a hyperparameter k (the number 
of communities) that depends on the network and is still 
untractable for large graphs with millions of nodes. On the 
other hand, the computing time is reduced by this same factor 



k. This is another path to investigate in further work. 

Herbster et al. [32 1 proposed a technique for fast label pre- 
diction on graphs through the approximation of the graph with 
either a minimum spanning tree or a shortest path tree. Once 
the tree has been extracted, the pseudoinverse of the laplacian 
matrix can be computed efficiently. The fast computation of 
the pseudo-inverse enables to address prediction problems 
on large graphs. Finally, Tang and Liu have investigated 
relational learning via latent social dimensions (33j-(35j. They 
proposed to extract latent social dimensions based on network 
information (such as Facebook, Twitter,...) first, then they used 
these as features for discriminative learning (via a SVM for 
example [33 1). Their approach tackles very large networks and 
provides promising results, especially when only a few labeled 
data are available. 

We also defined a group betweenness using Freeman's, or 
shortest path, betweenness (36| and a modified version of 
Newman's betweenness (37). For this last one, the transition 
probabilities were set to P lef , and the ending node of the 
walk was forced to be absorbing. Then, the expected num- 
ber of visits to each node was recorded and cumulated for 
each input-output path. However, our BoP group betweenness 
outperformed these two other class betweenness measure 
(Consequently, results are not reported in this paper). 

IV. The bag-of-paths betweennesses 

In order to define the BoP classifier, we need to introduce 
the BoP group betweenness centrality. This concept is itself 
an extension of the BoP betweenness centrality, which will 
be developed in the next subsection. This section starts with 



the BoP betweenness centrality concept in Subsection IV-A 



Then, its extension, the BoP group betweenness centrality, is 
described in Subsection IIV-BI 



A. The bag-of-paths betweenness centrality 

The BoP betweenness measure quantifies to which extent a 
node j lies in between other pairs of nodes i, k, and therefore 
is an important intermediary between nodes. Recall that from 
Q the probability of drawing a path starting at node i (s = i) 
and ending in node k (e = k) from a regular bag-of-paths is 
P(s = i, e = k) = z lk /Z. 

We now compute the probability P(s = i,int = j, e = 
fc; i ^ j ^ i / i) - or Pjjfc in short - that such paths visit an 
intermediate node int = j when i ^ j ^ k ^ i. Indeed, from 



Piik = 



E 

p£V ik 



5{j e p) ^ ref (p) exp [-0c(p)] 



E* 



S-reff" 



) exp I 



Hp')] 



(10) 



where S(j G p) is the indicator function, i.e. is equal to 1 if 
the path p contains (at least once) node j, and otherwise. 

We will now use the fact that each path p^ between i and k 
passing through j can be decomposed uniquely into a hitting 
sub-path pij from i to j and a regular sub-path pjk from j 
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to k. The sub-path py is found by following path pi k until 
reaching j for the first tim^] Therefore, for i ^ j ^ k ^ i, 

= Z E E K lei (p tJ )n ref (p jk ) 
x exp [-9c(pij ;)] exp [-9c(p jk )) 



1 

Z 



E * 



\exp[-9c(pij)] 



E ^(Pj'fe) ex P [-^(Pjfc)] 
E Fef (Pii) ex P[- 6 'c(Pu)] 



z 

=Z h P h (s = i, e = j) P(s =j i e = k), fori^j^k^i 

(11) 

Thus, from s ((3), 0, 

Pjjfe = -2t, P h (s = i, e = j) P(s = j,e = k) 



( z jk) 



Z h Z 

1 Zij Zj ^ 

2 



for i ^ j ^ k ^ i. 



(12) 



Since P(s = i, mi = j, e = fc) is only meaningful when 
i-Lj^k^i, from <[lOj» and ([TTJ, since we are only interested 
in the case in which this condition is false: 



P(s = i, int = j,e = k;i =/= j k) 

1 z ij z jk 



Z Za 



■ S(i +3±k) 



(13) 



Now, the a posteriori probabilities of visiting intermediate 
node j given that the path starts in i and ends in k are therefore 
(remember that i ^ j ^ k ^ i) 

P(int = j\s = i, e = k; i j k =/= i) 

P(s = i, int = j 1 e = k;i=/=j=/=k=/=i) 

n 

^ P(s = i, int = j', e = k;i ^ j' ^ k ^ i) 



E 

i'=i 



Z 3'3 ! 



S(i^j^k^i) 



(14) 



where we assumed that the node fc can be reached from node 
i and we used ( p"3j ). 

Based on these a posteriori probabilities, the bag-of-paths 
betweenness of node j is defined as the sum of the a posteriori 
probabilities of visiting j for all possible starting-destination 
pairs i, k: 

n n 

bet, = P(int = j\s = i,e = k;i ^ j ^ k 7^ i) 



i=i fe=i 

n n 



1 ^— \ ZijZjfc 

i=l fc=l \ " ( z ij' z j'k 



(15) 



i"=i 



This quantity indicates to which extent a node j lies in 
between pairs of nodes, and therefore to which extent j is an 
important intermediary in the network. 

Let us now derive the matrix formula computing the 
betweenness vector bet. This vector contains the n be- 
tweennesses for each node. First of all, the normaliza- 
tion factor will be computed, ni k — X)/=i (1 — 
Sij>)(l — Sj 'k) ( z tj' z j'k)/ Zj'j' , appearing in the denomina- 
tor of (115 1. We easily see that mu = Ej'=i {(1 — 
5ij>)zij>y{l/zj>j'}{(l - 5j>k)zj>k}- Therefore, by defining 
D^ 1 = (Diag(Z)) -1 whose diagonal contains elements 
1/zj'f, the matrix containing the normalization factors n ik 
is N = (Z - Diag(Z))D7 1 (Z - Diag(Z)). 

Moreover, the term Y^i=i Sfe=i 3 & ^ 

i)zij(l/nik)zjk appearing in ( [15] ) can be rewritten as 

ELiELi{(i - WMO -W/n ik )}{(i - 5 kJ )z\ 3 } 

where z\.- is the element i, j of matrix Z T (the transpose of 
Z). In matrix form, bet is therefore equal to 



bet =D- 1 diag[(Z T Diag(Z)) 

• (N* - Diag(N-))(Z T - Diag(Z))] 



(16) 



'This is the reason why we introduced hitting paths. 



where matrix N : contains elements n' ik = l/ri^. with 
N = (Z - Diag(Z))D7 1 (Z - Diag(Z)). Moreover, for a 
given matrix M, diag(M) is a column vector containing 
the diagonal of M while Diag(M) is a diagonal matrix 
containing the diagonal of M. 

B. The bag-of-paths group betweenness centrality 

Let us now generalize the bag-of-paths betweenness to a 
group betweenness measure. Quite naturally, the bag-of-paths 
group betweenness of node j will be defined as 

gbet^ (Ci,C fc ) = P(int = j\s eC„ee C k ; s f int ^e^s) 

(17) 

and can be interpreted as the extent to which the node j lies 
in between the two sets of nodes C L and C k . It is assumed that 
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the sets Cj,(i = l...m) are disjoint. Using Bayes' law provides 

P(int = j\s £ Ci,e £ Ck', s 7^ int ^e^s) 
P(s G Ci,int = j, e € Cfe; s 7^ iret ^ e ^ s) 
P(s € Cj, e G Cfe; s 7^ int ^e^s) 

^E ^E P(s = i', m< = j,e = k'; s ^ int ^ e ^ s) 

i'eCi fc'eCfc 

n 

P(s = i', mi = j', e = fc'; s 7^ int ^e^s) 

j' = li'GCi fc'GC fe 

(18) 



where o is the elementwise multiplication (Hadamard product) 
and we assume i 7^ fc. In this equation, the vector gbet(Ci,Cfc) 
must be normalized by dividing it by its L\ norm. Notice that 
Zo = Z — Diag(Z) is simply the fundamental matrix whose 
diagonal is set to zero. 

V. Semi-supervised classification through the 

BAG-OF-PATHS GROUP BETWEENNESS 

In this section, the bag of hitting paths model will be used 



Substituting (14i for the probabilities in (I81 allows to com 



pute the group betweenness measure in terms of the elements 
of the fundamental matrix Z: 



for classification purposes. From (17i and (19i, recall that 
the bag-of-paths group betweenness measure was defined as 

gbet ACi,Ck) = P(int = j\s eCi,e e C k ; s 7^ int 7^ e) 



"EE k ') Zi'jZjk' 



gbet, (Ci, C fe 



i'eCi k'eCk 



EE E sn'^f^k 



f Zi'j'Zj'k' 



^EE E*(^/^'^) 



j'=l i'ed k'ec k 



i' j' z j f k' 
z j'j' 



j'=i i'eCi k'ec k 



z j'j' 



(22) 



and, as before, the denominator of p2[ ) aims to normalize 
the probability distribution so that it sums to one. We will 



i'GCi k'eCk 



EE E'i^^^ 

y=\ i'ed k'£C k 



Zi> j' ' Zj' ' k' 



Z j'j' 



therefore compute the numerator of ( 22 1 and then normalize 
the resulting quantity. 

Notice, however, that in the derivation of the matrix form 



(19) 



where the denominator is simply a normalization factor ensur- 
ing that the probability distribution sums to one. It is therefore 
sufficient to compute the numerator and then normalize the 
resulting quantity. 

Let us put this expression in matrix form. We first define 
D z = Diag(Z) (D z is just the matrix Z where all non- 
diagonal elements are set to zero) and z\a as element i,j of 
matrix Z T (the transpose of Z). Here again, it is assumed that 
node i' and k! belong to different sets, Ci 7^ Cfe, so that i' 
and k' are necessarily different. Therefore, if yj. is a binary 
membership vector indicating which node belongs to class Ck 
(as described in Section |U- A] >, the numerator of (jT9j can be 
rewritten as (remembering that Ci 7^ Cfe) 

numerator (gbet ■ (Ci, Ck)) 
= J7. E E ( X ~ - § jk>) Zi'jZjk' 

= T7. (Et 1 - S ^>h) ( E (! - S i«)*i» J 

" Vi'GCi / Vfc'eCfc / 

(20) 

Consequently, in matrix form, the group betweenness vector 
reads 

gbetr&A) <- D z 1 ((Zj yi ) o (Zoyfe)) 
with Z = Z - Diag(Z), 



of the group betweenness (see (21 1), it was assumed that Ci 7^ 
Cfe. We will now recompute this quantity when starting and 
ending in the same class c, i.e. calculating gbet^ (C c , C c ). This 
will provide a measure of the extent to which nodes of G 
are in between - and therefore in the neighbourhood of - 
class c. A within-class betweenness is thus defined for each 
class c and each node will be assigned to the class showing 
the highest betweenness. The main hypothesis underlying this 
classification technique is that a node is likely to belong to the 
same class as its "neighboring nodes". This is usually called 
the local consistency assumption (also called smoothness and 
cluster assumption |5), fl2| , [38 1). 



The same reasoning as for deriving (21 1 is applied in order 
to compute the numerator of d22l), 



numerator (gbet^ (C c , C C )J 

= — E E S ( i ' ^j^ k '^ z i'jZjk' 

= jr. E E^-^'Xi-wa-^') 

= — E E ( X ~ ~ S w) Z i'3 Z 3k' 



Zi' j Zjk' 



gbet (d, C k ) <- n ^T^'/fi^n (normalization) 
||gbet(Cj,Cfe)||i 



• — E E ( X ~ S 3i') 6 i'h'(^ ~ 6 jk >) ZifjZj k . 

H i'ec c k'ec c 

— I E ( X ~ 6 3i') Z i'j ) ( E ( X ~ S jk') Z 3k' ) 

jj \l'£C c J Vfe'SCe / 

— E ( X ~ ^')^i'j( 1 - Sji>)zji> 



(21) 



j j 



\fe'=i 
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1 - 

— £((i-* jV )4.,)((i-* jV ) 



(23) 



where is a binary indicator indicating if node i belongs 
to the class c (so that y\ is equal to y ic , the element (i, c) 
of matrix Y) and z\j is the element i,j of matrix Z T (the 
transpose of Z). 

Now, we easily observe that (1 — 6ji')zji' is element 
of matrix Zo = Z — Diag(Z). This expression can thus be 
re-expressed in matrix form as 



numerator(gbet(C c , C c )) 

= D z - 1 [(Z£yXZ y c )-i 
with Z = Z - Diag(Z) 



Z£ o Z°)y c 



(24) 



where o is the elementwise multiplication (Hadamard product). 
After having computed this equation, the numerator must be 



normalized in order to obtain gbet (C C ,C C ) (see (22i). 

Finally, if we want to classify a node, gbet(C c ,C c ) is 
computed for each class c in turn and then, for each node, 
the class showing the maximal betweenness is chosen, 

£ = arg max (gbet(C c , C c )) , with 

c£C 

(Zq^Z- Diag(Z) (set diagonal to 0) 
gbet(C c , C c ) <- D- 1 [(ZSy c ) o (Z y c ) - (Z* o Z )y c ] 



gbet(C c ,C c ) 



gbet(C c ,C e 
|gbet(C c ,C c )|| 



(normalization) 



(25) 

The pseudo-code for the BoP classifier can be found in Al- 
gorithm 1. Of course, once computed, the group betweenness 
is only used for the unlabeled nodes. 

VI. Experimental comparisons 

In this section, the bag-of-paths group betweenness ap- 
proach for semi-supervised classification (referred to as the 
BoP classifier for simplicity) will be compared to other semi- 
supervised classification techniques on multiple data sets. 
The different classifiers to which the BoP classifier will be 
compared were already introduced in Section [HI] and are 
recalled in Table [TV] 

The goal of the experiments of this section is to classify 
unlabeled nodes in medium-size partially labeled graphs and 
to compare the different methods in terms of classification 
accuracy. This comparison is performed on medium-size net- 
works only since kernel approaches are difficult to compute on 
large networks. The computational tractability of the methods 
used in this experimental section will also be analyzed. 

This section is organized as follows. First, the data sets 
used for the semi-supervised classification will be described 



in Subsection VI-A Second, the experimental methodology 
is detailed in Subsection |VI-B| Third, the results will be 
discussed in Subsection |VI-C| Finally, the computation time 
will be investigated in Subsection |VI-D| 

A. Datasets 

The different classifiers are compared on 14 data sets that 
were used previously for semi-supervised classification: nine 



Algorithm 1 Classification through the bag-of-paths group 

betweenness algorithm. 

Input: 

- A weighted directed graph G containing n nodes, represented 
by its n x n adjacency matrix A, containing affinities. 

- The n x n cost matrix C associated to G (usually, the costs 
are the inverse of the affinities, but other choices are possible). 

- m binary indicator vectors y c containing as entries 1 for nodes 
belonging to the class whose label index is c, and otherwise. 
Classes are mutually exclusive. 

- The inverse temperature parameter 9. 
Output: 

- The n x m membership matrix U containing the membership 
of each node i to class k, 

1: D «— Diag(Ae) {the row-normalization matrix} 
2: P ref <— D _1 A {the reference transition probabilities matrix} 
3: W <— P ref oexp [— 6C] {elementwise exponential and multipli- 
cation o} 

4: Z <s— (I — W) -1 {the fundamental matrix} 
5: Zo Z — Diag(Z) {set diagonal to zero} 
6: D z <- Diag(Z) 

7: U <s— Zeros(n, m) {initialize the membership matrix} 
for c = 1 to m do 

y* «- D7 1 [(ZSy c ) o (Z°y c ) - (Zj o Z )y c ] {compute 
the group betweenness for class c; o is the elementwise 
multiplication (Hadamard product)} 



10: 

11: 
12: 



y c <- 

end for 



ml 



{normalize the betweenness scores} 



£ <s— argmax(y*) {each node is assigned to the class showing 

the largest class betweenness} 
for i = 1 to n do 

U <— 1 {compute the elements of the membership matrix} 
end for 
return U 



TABLE I 

Class distribution of the IMDb-proco data set. 



Class 


IMDb 


High-revenue 


572 


Low-revenue 


597 


Total 


1169 



Newsgroups data sets j39J, the four universities WebKB cocite 
data sets (40), (T7) and the IMDb prodco data set | |40| . 

Newsgroups: The Newsgroups data set is composed of about 
20,000 unstructured documents, taken from 20 discussion 
groups (newsgroups) of the Usenet diffusion list. 20 Classes 
(or topics) were originally present in the data sej^] For our 
experiments, nine subsets related to different topics are ex- 
tracted from the original data set pT) , resulting in a total of 
nine different data sets. The data sets were built by sampling 
about 200 documents at random in each topic (three samples 
of two, three and five classes, thus nine samples in total). 
Repartition is listed from Table [TTJ The extraction process as 
well as the procedure used for building the graph are detailed 
in (41). 

WebKB cocite: These data sets consist of sets of web pages 
gathered from four computer science departments (four data 

2 The different data sets used for these comparisons are described 
in Subsection |VI-A| Implementations and datasets are available at 
http://www.isys.ucl.ac.be/staff/lebichot/research.htm 
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TABLE II 

Class distribution of the nine Newsgroups data sets. Newsgroup 1-3 contain two classes, Newsgroup 4-6 contain three classes and 

Newsgroup 7-9 contain five classes. 



Class 


NG1 


NG2 


NG3 


NG4 


NG5 


NG6 


NG7 


NG8 


NG9 


1 


200 


198 


200 


200 


200 


197 


200 


200 


200 


2 


200 


200 


199 


200 


198 


200 


200 


200 


200 


3 








200 


200 


198 


200 


198 


197 


4 














200 


200 


200 


5 














198 


200 


200 


Total 


400 


398 


399 


600 


598 


595 


998 


998 


997 



Newsgroup NG1 (2ciass) 



Newsgroup NG2 (2class) 



Newsgroup NG3 (2class) 




- NLR 

- RCT 

- HF 
RWWR 

- DW1 
DW2 

- BOP 




- NLR 

- RCT 

- HF 

- RWWR 

- DW1 
DW2 

- BOP 




Fig. 1. Classification rates in per/cents, averaged over 20 runs, obtained on partially labeled graphs. Results are reported for the eight methods (RL, RNL, 
RCT, HF, RWWR, DW1, DW2, BoP) and for five labeling rates (10%, 30%, 50%, 70%, 90%). These graphs show the results obtained on the three 2-classes 
Newsgroups data sets. 



TABLE Ifl 

Class distribution of the four WebKB cocite data sets. 



Class 


Cornell 


Texas 


Washington 


Wisconsin 


Course 


54 


51 


170 


83 


Department 


25 


36 


20 


37 


Faculty 


62 


50 


44 


37 


Project 


54 


28 


39 


25 


Staff 


6 


6 


10 


11 


Student 


145 


163 


151 


155 


Total 


346 


334 


434 


348 


Majority 










class (%) 


41.9 


48.8 


39.2 


44.5 



sets is shown in Table Hill 

IMDb-prodco: The collaborative Internet Movie Database 
(IMDb, |40) ) has several applications such as making movie 
recommendations or movie category classification. The clas- 
sification problem focuses on the prediction of the movie 
notoriety (whether the movie is a box-office hit or not). It 
contains a graph of movies linked together whenever they 
share the same production company. The weight of an edge in 
the resulting graph is the number of production companies 
that two movies have in common. The IMDb-proco class 
distribution is shown in Table U 



sets, one for each university), with each page manually labeled 
into one of six categories: course, department, faculty, project, 
staff, and student J40) . The pages are linked by co-citation (if 
x links to z and y links to z, then x and y are co-citing z), 
resulting in an undirected graph. The composition of the data 



B. Experimental methodology 

The classification accuracy will be reported for several 
labeling rates (10%, 30%, 50%, 70%, 90%), i.e. proportions of 
nodes for which the label is known. The labels of remaining 
nodes are deleted during the modeling phase and are used as 
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Newsgroup NG4 (3ciass) 



Newsgroup NG5 (3class) 



Newsgroup NG6 (3class) 





- NLR 

- RCT 

- HF 

- RWWR 

- DW1 
DW2 

- BOP 




Fig. 2. Classification rates in percents, averaged over 20 runs, obtained on partially labeled graphs. Results are reported for the eight methods (RL, RNL, 
RCT, HF, RWWR, DW1, DW2, BoP) and for five labeling rates (10%, 30%, 50%, 70%, 90%). These graphs show the results obtained on the three 3-classes 
Newsgroups data sets. 



Newsgroup NG7 (5dass) 



Newsgroup NG8 (5class) 



Newsgroup NG9 (5class) 




- NLR 

- RCT 

- HF 
RWWR 

- DW1 
DW2 

- BOP 




NLR 
RCT 
HF 

RWWR 
DW1 
DW2 
BOP 




Fig. 3. Classification rates in percents, averaged over 20 runs, obtained on partially labeled graphs. Results are reported for the eight methods (RL, RNL, 
RCT, HF, RWWR, DW1, DW2, BoP) and for five labeling rates (10%, 30%, 50%, 70%, 90%). These graphs show the results obtained on the three 5-classes 
Newsgroups data sets. 
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Fig. 4. Classification rates in percents, averaged over 20 runs, obtained on partially labeled graphs. Results are reported for the eight methods (RL, RNL, 
RCT, HF, RWWR, DW1, DW2, BoP) and for five labeling rates (10%, 30%, 50%, 70%, 90%). These graphs show the results obtained on the four WebKB 
cocite data sets. 




Fig. 5. Classification rates in percents, averaged over 20 runs, obtained on 
partially labeled graphs. Results are reported for the eight methods (RL, RNL, 
RCT, HF, RWWR, DW1, DW2, BoP) and for five labeling rates (10%, 30%, 
50%, 70%, 90%). These graphs show the results obtained on the IMDb-prodco 
data set. 



test data during the assessment phase. For each considered 
labeling rate, 20 random node label deletions were performed 
(20 runs) and performances are averaged on these 20 runs. For 
each unlabeled node, the various classifiers predict the most 
suitable category. Moreover, for each run, a 10-fold nested 
cross-validation is performed for tuning the parameters of 
the models. The external folds are obtained by 10 successive 
rotations of the nodes and the performance of one specific 



run is the average over these 10 folds. Moreover, for each 
fold of the external cross-validation, a 10-fold internal cross- 
validation is performed on the remaining labeled nodes in 
order to tune the hyper parameters of the classifiers (i.e. 
parameters a, A and 9 (see Table IV i - methods HF and DW1 



do not have any hyper parameter). Thus, for each method and 
each labeling rate, the mean classification rate averaged on the 
20 runs will be reported. 

C. Results & discussion 

Comparative results for each method on the fourteen data 
sets are reported as follows: the results on the nine News- 
Groups data sets are shown on Fig. T][3 the results on the 
four WebKB Cocite data sets are shown on Fig. [4] and the 
results on the IMBd-prodco data set are shown on Fig. [5] 

Statistical significance tests for each labeling rate are de- 
tailed from Table [V] One-side t-tests were performed to 
determine whether or not the performance of a method is 
significantly superior (p-value lesser than 0.05 on the 20 runs) 
to another. Table[V]can be read as follows. Each entry indicates 
on how many data sets (on a total of 14) the row method was 
significantly better than the column method. At the bottom 
of each table, the Win/Tie/Lose frequency summarizes how 
many times the BoP classifier was significantly better (Win), 
was equivalent (Tie), or was significantly worse (Lose) than 
each other method. 

Moreover, for each labeling rate, the different classifiers 
have been ordered according to a Borda score ranking. For 
each data set, each method is granted with a certain number 
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TABLE IV 

THE EIGHT CLASSIFIERS AND THE VALUE RANGE TESTED FOR TUNING THEIR PARAMETERS. 



Classifier name 


Acronym 


Parameter 


Tested values 


Regularized laplacian kernel 


RL 


A > 


io- t, ,io- b ,...,io B 


Regularized normalised laplacian kernel 


RNL 


A > 


io- e ,io- b ,...,io b 


Regularized commute-time kernel 


RCT 


a e ]0,1] 


0.1, 0.2, ....1 


Harmonic function 


HF 


/ 


/ 


Random walk with restart 


RWWR 


a e ]0,1] 


0.1, 0.2, ....1 


Discriminative random walks 


DWl 


/ 


/ 


Killing discriminative random walks 


DW2 


a G ]0,1] 


0.1, 0.2, ...,1 


BoP classifier 


BoP 


e > o 


io- b ,io- 5 ,...,io 2 



of points, or rating. This number of points is equal to eight 
if the classifier is the best classifier (i.e., has the best mean 
classification rate on this data set), seven if the classifier is the 
second best and so on, so that the worst classifier is granted 
with only one point. The ratings are then summed across 
all the considered data sets and the classifiers are sorted by 
descending total rating. The final ranking, together with the 
total ratings, are reported from Table [Vlj 

We observe that the BoP classifier always achieved compet- 
itive results since it ranges among the top methods on all data 
sets. More precisely, the BoP classifier actually tends to be the 
best algorithm for all labeling rates except for 90% labeling 
rate, where it comes third as observed from Table |VI| and from 
Table [V] The RCT kernel achieves good performance and is 
the best of the kernel-based classifier (as suggested in J23|). It 
is also the best algorithm when the labeling rate is very high 
(90%). 

Notice that RCT, DW2 and RWWR largely outperform the 
other algorithms (beside BoP). However, it is difficult to figure 
out which of those three methods is the best, after BoP. It 
can be noticed that the DW2 version of the 2?-walks is more 
competitive when the labeling rate is low and that it performs 
much better than the DWl version, especially for low labeling 
rates: the Win/Tie/Lose scores for DW2 against DWl are 
7/1/6, 6/1/7, 7/2/7, 13/1/0 and 14/0/0 respectively for 90%, 
70%, 50%, 30%, 10% percentage of labeling rate. 

From the fifth to the eight position, the ranking is less 
clear since none of the methods is really better than the other. 
However, all of these methods (NR and RNL as well as HF 
and DWl) are significantly worse than BoP, RCT, RWWR 
and DW2. Notice also that the performance of DWl and HF 
drops significantly when labeling rate decreases. In addition, 
the DWl algorithm provides surprising results on the IMBd- 
prodco data set by raising a classication rate of only 20%, but 
this remains anecdotal. 

D. Computation time 

The computational tractability of a method is an important 
consideration to take into account. Table VII provides a 



comparison of the running time of all methods. To explore 
computation time with respect to the number of nodes and 
the number of classes, the five-classes Newsgroups data set 
number seven (NG7) will be used two times, providing the 
following variants, NG10 and NG11: 

• For NG10, the 499 first nodes are re-labeled class one, 
and the 499 last nodes are relabeled class two. This 
provides a two-classes network with 998 nodes. 



• For NG11, the 100 first nodes are re-labeled class one, 
the 100 following nodes are re-labeled class two, and so 
on to get 10 classes (notice that class 9 and 10 have only 
99 nodes since NG7 has only 998 nodes). This provides 
a ten-classes network with 998 nodes. 
For each method, 100 runs on each of the data sets are 
performed and the running time is recorded for each run. The 
100 running times are averaged and results are reported in 
Table IVTT1 

We observe that HF is one of the quickest method, but 
sadly it is not competitive in terms of accuracy, as reported in 
Subsection ( |VI-C[ >. Notice that two kernel methods, RL and 
RCT, have more or less the same computation time since the 
alignment is done in one time for all the classes. RNL, the last 
kernel method, is slower than RL, HF and RCT. After the HF 
and the kernel methods, BoP classifier achieves competitive 
results with the remaining classifiers. The time augmentation 
when the graph size increases is similar for all methods (except 
for RL for which the augmentation is smaller), but the BoP 
classifier has the same advantage than the kernel methods: its 
computation time does not increase strongly when the number 
of classes increases. This comes from the algorithm structure: 
to contrary of RWWR, DWl and DW2, the BoP classifier does 
not require a matrix inversion for each class. Furthermore, 
the matrix inversions (or linear systems of equations to solve) 
required for the BoP can be computed as far as the graph 
(through is adjacency matrix) is known, which is not the case 
with kernel methods. This is a good property for BoP, since it 
means that rows 1 to 6 of Algorithm 1 can be pre-computed 
once for all folds in the cross-validation. 

VII. Conclusion 

This paper investigates an application of the bag-of-paths 
framework viewing the graph as a virtual bag from which paths 
are drawn according to a Boltzmann sampling distribution. 

In particular, it introduces a novel algorithm for graph- 
based semi-supervised classification through the bag-of-paths 
group betweenness, or BoP for short (described in Section 
[V) . The algorithm sums the a posteriori probabilities of 
drawing a path visiting a given node of interest according to 
a biased sampling distribution, and this sum defines our BoP 
betweenness measure. The Boltzmann sampling distribution 
depends on a parameter, 6, gradually biasing the distribution 
towards shorter paths: when 6 is large, only little exploration 
is performed and only the shortest paths are considered while 
when 9 is small (close to + ), longer paths are considered 
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TABLE V 

One-side *-test for all labeling rates. Each entry indicates on how many data sets the row method was significantly better than 
the column method. on the bottom, the wln/tle/lose frequency summarizes how many times the bop classifier was significantly 
better (win), equivalent (tie) or significantly worse (lose) than each other method. 







RL 


RNL 


RCT 


HF 


RWWR 


DW1 


DW2 


BoP 


a 


RL 





9 


2 


7 


4 


7 


5 


4 


RNL 


4 





3 


6 


4 


7 


4 


4 


DC 


RCT 


12 


10 





12 


3 


11 


8 


9 


J 


HF 


6 


7 


2 





4 


5 


5 


4 


1 


RWWR 


10 


9 


2 


12 





10 


8 


7 


— 


DW1 


5 


7 


3 


2 


4 





6 


2 




DW2 


9 


9 


5 


9 


6 


7 





3 




BoP 


7 


9 


4 


8 


6 


11 


6 





Win/Tie/Lose BoP 


7/3/4 


9/1/4 


4/1/9 


8/2/4 


6/3/7 


11/1/2 


6/5/3 


total: 14 




RL 





9 


2 


7 


4 


7 


5 


4 


« 


RNL 


4 





3 


6 


4 


7 


4 


4 


Ofi 


RCT 


12 


10 





12 


3 


11 


8 


9 




HF 


6 


7 


2 





4 


5 


5 


4 


1 


RWWR 


10 


9 


2 


12 





10 


8 


7 


s 


DW1 


5 


7 


3 


2 


4 





6 


2 




DW2 


9 


9 


5 


9 


6 


7 





3 


e 

r- 


BoP 


7 


9 


4 


8 


6 


11 


6 







Win/Tie/Lose BoP 


1 1/3/0 


11/1/2 


5/4/5 


10/2/2 


7/5/2 


11/1/2 


9/1/4 


total: 14 




RL 





9 


2 


7 


4 


7 


5 


4 


n 


RNL 


4 





3 


6 


4 


7 


4 


4 




RCT 


12 


10 





12 


3 


11 


8 


9 


= 


HF 


6 


7 


2 





4 


5 


5 


4 


- 


RWWR 


10 


9 


2 


12 





10 


8 


7 




DW1 


5 


7 


3 


2 


4 





6 


2 


# 


DW2 


9 


9 


5 


9 


6 


7 





3 




BoP 


7 


9 


4 


8 


6 


11 


6 





Win/Tie/Lose BoP 


12/1/1 


12/0/2 


9/1/4 


12/0/2 


9/2/3 


12/0/2 


9/2/3 


total: 14 


V 


RL 





9 


2 


7 


4 


7 


5 


4 


fl 


RNL 


4 





3 


6 


4 


7 


4 


4 


s- 

Ml 


RCT 


12 


10 





12 


3 


11 


8 


9 




HF 


6 


7 


2 





4 


5 


5 


4 


I 


RWWR 


10 


9 


2 


12 





10 


8 


7 




DW1 


5 


7 


3 


2 


4 





6 


2 




DW2 


9 


9 


5 


9 


6 


7 





3 


o 


BoP 


7 


9 


4 


8 


6 


11 


6 





Win/Tie/Lose BoP 


13/1/0 


12/1/1 


10/2/2 


14/0/0 


9/4/1 


14/0/0 


11/1/2 


total: 14 




RL 





9 


2 


7 


4 


7 


5 


4 


2 
2 


RNL 


4 





3 


6 


4 


7 


4 


4 


St 


RCT 


12 


10 





12 


3 


11 


8 


9 


bellin 


HF 


6 


7 


2 





4 


5 


5 


4 


RWWR 


10 


9 


2 


12 





10 


8 


7 


a 
-J 


DW1 


5 


7 


3 


2 


4 





6 


2 


# 


DW2 


9 


9 


5 


9 


6 


7 





3 


o 


BoP 


7 


9 


4 


8 


6 


11 


6 







Win/Tie/Lose BoP 


14/0/0 


13/1/0 


1 1/4/2 


14/0/0 


9/1/4 


14/0/0 


12/1/1 


total: 14 



TABLE VI 

FOR EACH LABELING RATE, THE DIFFERENT CLASSIFIERS ARE RANKED THROUGH A BORDA RATING (SEE THE TEXT FOR DETAILS). THE CLASSIFIERS 
ARE THEN RANKED ACCORDING TO THE TOTAL RATING OBTAINED ACROSS ALL DATA SETS (THE LARGER THE BETTER). I STANDS FOR LABELING RATE 

AND THE NUMBERS BETWEEN PARENTHESES ARE THE TOTAL RATINGS. 



Ranking First 



Second 



Third 



Fourth Fifth 



Sixth 



Seventh Last 



= 90% 
= 70% 
= 50% 
= 30% 
= 10% 



RCT (86) 
BoP (86) 
BoP (92) 
BoP (104) 
BoP (103) 



RWWR (74) 
RCT (82) 
RCT (79) 
DW2 (83) 
RWWR (89) 



BoP (71) 
RWWR (74) 
RWWR (74) 
RWWR (82) 
DW2 (83) 



DW2 (69) 
DW2 (65) 
DW2 (73) 
RCT (77) 
RCT (82) 



RL (53) 
HF (59) 
HF (50) 
RNL (42) 
RNL (49) 



HF (53) 
DW1 (51) 
DW1 (50) 
RL (41) 
RL (46) 



RNL (50) 
RL (44) 
RNL (46) 
HF (41) 
HF (35) 



DW1 (48) 
RNL (43) 
RL (40) 
HF (34) 
HF (17) 



TABLE VII 

Overview of cpu time in seconds needed to classify all the unlabeled nodes. Results are averaged on 100 runs. The CPU used was 

AN lNTEL(R)CORE(TM)l3 AT 2. 1 3 GHZ WITH 3072 OF CACHE SIZE AND 6 GB OF RAM AND THE PROGRAMMING LANGUAGE IS MATLAB. 



Dataset 


RL 


RNL 


RCT 


HF 


RWWR 


DW1 


DW2 


BoP 


NG1 (2 classes, 400 nodes) 


0.013 


0.0433 


0.010 


0.012 


0.036 


0.061 


0.064 


0.051 


NG10 (2 classes, 998 nodes) 


0.084 


0.422 


0.070 


0.109 


0.321 


0.623 


0.639 


0.468 


NG11 (10 classes, 998 nodes) 


0.086 


0.445 


0.071 


0.107 


1.167 


2.611 


2.683 


0.631 


Ratio NG10/NG1 


6.28 


9.74 


7.11 


9.07 


8.9 


10.16 


9.98 


9.11 


Ratio NG11/NG10 


1.03 


1.06 


1.01 


0.98 


3.63 


4.19 


4.20 


1.35 
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and are sampled according to the product of the transition 
probabilities pfj along the path (a natural random walk). 

Experiments on real-world data sets show that the BoP 
method outperforms the other considered approaches when 
only a few labeled nodes are available. When more nodes are 
labeled, the BoP method is still competitive. The computation 
time of the BoP method is also substantially lower in most of 
the cases. 

Our future work will include several extensions of the pro- 
posed approach. Another interesting issue is how to combine 
the information provided by the graph and the node features 
in a clever, preferably optimal, way. The interest of including 
node features should be assessed experimentally. A typical 
case study could be the labeling of protein-protein interaction 
networks. The node features could involve gene expression 
measurements for the corresponding proteins. 

Yet another application of the bag-of-paths framework could 
be the definition of a robustness measure or criticality measure 
of the nodes. The idea would be to compute the change in 
reachability between nodes when deleting one node within the 
BoP framework. Nodes having a large impact on reachability 
would be then considered as highly critical. 
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